search behavior
Search Wisely: Mitigating Sub-optimal Agentic Searches By Reducing Uncertainty
Wu, Peilin, Zhang, Mian, Zhang, Xinlu, Du, Xinya, Chen, Zhiyu Zoey
Agentic Retrieval-Augmented Generation (RAG) systems enhance Large Language Models (LLMs) by enabling dynamic, multi-step reasoning and information retrieval. However, these systems often exhibit sub-optimal search behaviors like over-search (retrieving redundant information) and under-search (failing to retrieve necessary information), which hinder efficiency and reliability. This work formally defines and quantifies these behaviors, revealing their prevalence across multiple QA datasets and agentic RAG systems (e.g., one model could have avoided searching in 27.7% of its search steps). Furthermore, we demonstrate a crucial link between these inefficiencies and the models' uncertainty regarding their own knowledge boundaries, where response accuracy correlates with model's uncertainty in its search decisions. To address this, we propose $β$-GRPO, a reinforcement learning-based training method that incorporates confidence threshold to reward high-certainty search decisions. Experiments on seven QA benchmarks show that $β$-GRPO enable a 3B model with better agentic RAG ability, outperforming other strong baselines with a 4% higher average exact match score.
- Europe > France (0.05)
- South America > Argentina (0.04)
- South America > Colombia (0.04)
- (10 more...)
- Workflow (1.00)
- Research Report (0.64)
HiPRAG: Hierarchical Process Rewards for Efficient Agentic Retrieval Augmented Generation
Wu, Peilin, Zhang, Mian, Wan, Kun, Zhao, Wentian, He, Kaiyu, Du, Xinya, Chen, Zhiyu
Agentic RAG is a powerful technique for incorporating external information that LLMs lack, enabling better problem solving and question answering. However, suboptimal search behaviors exist widely, such as over-search (retrieving information already known) and under-search (failing to search when necessary), which leads to unnecessary overhead and unreliable outputs. Current training methods, which typically rely on outcome-based rewards in a RL framework, lack the fine-grained control needed to address these inefficiencies. To overcome this, we introduce Hierarchical Process Rewards for Efficient agentic RAG (HiPRAG), a training methodology that incorporates a fine-grained, knowledge-grounded process reward into the RL training. Our approach evaluates the necessity of each search decision on-the-fly by decomposing the agent's reasoning trajectory into discrete, parsable steps. We then apply a hierarchical reward function that provides an additional bonus based on the proportion of optimal search and non-search steps, on top of commonly used outcome and format rewards. Experiments on the Qwen2.5 and Llama-3.2 models across seven diverse QA benchmarks show that our method achieves average accuracies of 65.4% (3B) and 67.2% (7B). This is accomplished while improving search efficiency, reducing the over-search rate to just 2.3% and concurrently lowering the under-search rate. These results demonstrate the efficacy of optimizing the reasoning process itself, not just the final outcome. Further experiments and analysis demonstrate that HiPRAG shows good generalizability across a wide range of RL algorithms, model families, sizes, and types. This work demonstrates the importance and potential of fine-grained control through RL, for improving the efficiency and optimality of reasoning for search agents.
- Europe > Austria > Vienna (0.14)
- North America > United States > Pennsylvania (0.05)
- North America > United States > Texas > Dallas County > Grand Prairie (0.04)
- (10 more...)
- Workflow (0.68)
- Research Report > New Finding (0.34)
Beyond Outcome Reward: Decoupling Search and Answering Improves LLM Agents
Wang, Yiding, Wei, Zhepei, Zhu, Xinyu, Meng, Yu
Enabling large language models (LLMs) to utilize search tools offers a promising path to overcoming fundamental limitations such as knowledge cutoffs and hallucinations. Recent work has explored reinforcement learning (RL) for training search-augmented agents that interleave reasoning and retrieval before answering. These approaches usually rely on outcome-based rewards (e.g., exact match), implicitly assuming that optimizing for final answers will also yield effective intermediate search behaviors. Our analysis challenges this assumption: we uncover multiple systematic deficiencies in search that arise under outcome-only training and ultimately degrade final answer quality, including failure to invoke tools, invalid queries, and redundant searches. To address these shortcomings, we introduce DeSA (Decoupling Search-and-Answering), a simple two-stage training framework that explicitly separates search optimization from answer generation. In Stage 1, agents are trained to improve search effectiveness with retrieval recall-based rewards. In Stage 2, outcome rewards are employed to optimize final answer generation. Across seven QA benchmarks, DeSA-trained agents consistently improve search behaviors, delivering substantially higher search recall and answer accuracy than outcome-only baselines. Notably, DeSA outperforms single-stage training approaches that simultaneously optimize recall and outcome rewards, underscoring the necessity of explicitly decoupling the two objectives.
- North America > United States > Virginia (0.04)
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- (2 more...)
The Power of Framing: How News Headlines Guide Search Behavior
Poudel, Amrit, Milkowski, Maria, Weninger, Tim
Search engines play a central role in how people gather information, but subtle cues like headline framing may influence not only what users believe but also how they search. While framing effects on judgment are well documented, their impact on subsequent search behavior is less understood. We conducted a controlled experiment where participants issued queries and selected from headlines filtered by specific linguistic frames. Headline framing significantly shaped follow-up queries: conflict and strategy frames disrupted alignment with prior selections, while episodic frames led to more concrete queries than thematic ones. We also observed modest short-term frame persistence that declined over time. These results suggest that even brief exposure to framing can meaningfully alter the direction of users information-seeking behavior.
- North America > United States > Indiana > St. Joseph County > Notre Dame (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > Canada (0.04)
- Research Report > Strength High (1.00)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Government (1.00)
- Health & Medicine > Therapeutic Area (0.69)
- Media > News (0.69)
- Education > Educational Setting (0.68)
Lucy: edgerunning agentic web search on mobile with machine generated task vectors
Dao, Alan, Vu, Dinh Bach, Nguyen, Alex, Buppodom, Norapat
Small language models (SLMs) are inherently limited in knowledge-intensive tasks due to their constrained capacity. While test-time computation offers a path to enhanced performance, most approaches treat reasoning as a fixed or heuristic process. In this work, we propose a new paradigm: viewing the model's internal reasoning, delimited by
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.89)
OMS: On-the-fly, Multi-Objective, Self-Reflective Ad Keyword Generation via LLM Agent
Chen, Bowen, Wang, Zhao, Takamatsu, Shingo
Keyword decision in Sponsored Search Advertising is critical to the success of ad campaigns. While LLM-based methods offer automated keyword generation, they face three major limitations: reliance on large-scale query-keyword pair data, lack of online multi-objective performance monitoring and optimization, and weak quality control in keyword selection. These issues hinder the agentic use of LLMs in fully automating keyword decisions by monitoring and reasoning over key performance indicators such as impressions, clicks, conversions, and CTA effectiveness. To overcome these challenges, we propose OMS, a keyword generation framework that is On-the-fly (requires no training data, monitors online performance, and adapts accordingly), Multi-objective (employs agentic reasoning to optimize keywords based on multiple performance metrics), and Self-reflective (agentically evaluates keyword quality). Experiments on benchmarks and real-world ad campaigns show that OMS outperforms existing methods; ablation and human evaluations confirm the effectiveness of each component and the quality of generated keywords.
- North America > United States (0.14)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- (5 more...)
Comparing Optimization Algorithms Through the Lens of Search Behavior Analysis
Cenikj, Gjorgjina, Petelin, Gašper, Eftimov, Tome
The field of numerical optimization has recently seen a surge in the development of "novel" metaheuristic algorithms, inspired by metaphors derived from natural or human-made processes, which have been widely criticized for obscuring meaningful innovations and failing to distinguish themselves from existing approaches. Aiming to address these concerns, we investigate the applicability of statistical tests for comparing algorithms based on their search behavior. We utilize the cross-match statistical test to compare multivariate distributions and assess the solutions produced by 114 algorithms from the MEALPY library. These findings are incorporated into an empirical analysis aiming to identify algorithms with similar search behaviors.
- Europe > Slovenia > Central Slovenia > Municipality of Ljubljana > Ljubljana (0.05)
- North America > United States > New York > New York County > New York City (0.05)
- Oceania > Australia > Victoria > Melbourne (0.04)
- Europe > Czechia > Prague (0.04)
On the Emergence of Thinking in LLMs I: Searching for the Right Intuition
Ye, Guanghao, Pham, Khiem Duc, Zhang, Xinzhi, Gopi, Sivakanth, Peng, Baolin, Li, Beibin, Kulkarni, Janardhan, Inan, Huseyin A.
Recent AI advancements, such as OpenAI's new models, are transforming LLMs into LRMs (Large Reasoning Models) that perform reasoning during inference, taking extra time and compute for higher-quality outputs. We aim to uncover the algorithmic framework for training LRMs. Methods like self-consistency, PRM, and AlphaZero suggest reasoning as guided search. We ask: what is the simplest, most scalable way to enable search in LLMs? We propose a post-training framework called Reinforcement Learning via Self-Play (RLSP). RLSP involves three steps: (1) supervised fine-tuning with human or synthetic demonstrations of the reasoning process, (2) using an exploration reward signal to encourage diverse and efficient reasoning behaviors, and (3) RL training with an outcome verifier to ensure correctness while preventing reward hacking. Our key innovation is to decouple exploration and correctness signals during PPO training, carefully balancing them to improve performance and efficiency. Empirical studies in the math domain show that RLSP improves reasoning. On the Llama-3.1-8B-Instruct model, RLSP can boost performance by 23% in MATH-500 test set; On AIME 2024 math problems, Qwen2.5-32B-Instruct improved by 10% due to RLSP. However, a more important finding of this work is that the models trained using RLSP, even with the simplest exploration reward that encourages the model to take more intermediate steps, showed several emergent behaviors such as backtracking, exploration of ideas, and verification. These findings demonstrate that RLSP framework might be enough to enable emergence of complex reasoning abilities in LLMs when scaled. Lastly, we propose a theory as to why RLSP search strategy is more suitable for LLMs inspired by a remarkable result that says CoT provably increases computational power of LLMs, which grows as the number of steps in CoT \cite{li2024chain,merrill2023expresssive}.
- Workflow (1.00)
- Research Report > New Finding (1.00)
Reproducing and Extending Experiments in Behavioral Strategy with Large Language Models
Albert, Daniel, Billinger, Stephan
Two prominent approaches have emerged to advance our understanding of these microfoundations of strategy: computational work and human lab experiments. Agent-based computational simulations have sharpened our understanding of performance and learning consequences stemming from differences in individuals' cognition (Csaszar and Levinthal 2016, Gavetti and Levinthal 2000, Knudsen and Srikanth 2014, Winter et al. 2007). Additionally, scholars have increasingly designed experiments to study human responses within various tasks, such as searching for high-performing alternatives in unknown decision-spaces (Bergenholtz et al. 2023, Billinger et al. 2014, 2021, Richter et al. 2023), self-selecting into specific organizational tasks (Raveendran et al. 2022), exhibiting organizational voting behavior (Piezunka and Schilke 2023), and making innovation choices in response to different organizational contingencies (Klingebiel 2022). Despite significant strides, a key challenge in advancing behavioral strategy lies in building and testing theories of individual-level cognition and its effects on the revealed decisions that our field typically focuses on. More theoretical development and empirical testing are needed to understand when and why decision-makers follow particular heuristics in specific situations, and what task factors influence their cognitive processes.
- North America > United States > New York (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- North America > United States > California > Los Angeles County > Santa Monica (0.04)
- (2 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Leisure & Entertainment > Games > Chess (0.47)
- Health & Medicine (0.46)
- Education (0.46)
Stable Tool-Use with Flexible Musculoskeletal Hands by Learning the Predictive Model of Sensor State Transition
Kawaharazuka, Kento, Tsuzuki, Kei, Onitsuka, Moritaka, Asano, Yuki, Okada, Kei, Kawasaki, Koji, Inaba, Masayuki
The flexible under-actuated musculoskeletal hand is superior in its adaptability and impact resistance. On the other hand, since the relationship between sensors and actuators cannot be uniquely determined, almost all its controls are based on feedforward controls. When grasping and using a tool, the contact state of the hand gradually changes due to the inertia of the tool or impact of action, and the initial contact state is hardly kept. In this study, we propose a system that trains the predictive network of sensor state transition using the actual robot sensor information, and keeps the initial contact state by a feedback control using the network. We conduct experiments of hammer hitting, vacuuming, and brooming, and verify the effectiveness of this study.